45 research outputs found
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
We propose Hierarchical Text Spotter (HTS), a novel method for the joint task
of word-level text spotting and geometric layout analysis. HTS can recognize
text in an image and identify its 4-level hierarchical structure: characters,
words, lines, and paragraphs. The proposed HTS is characterized by two novel
components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve
polygons of text lines and an affinity matrix for paragraph grouping between
detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits
lines into characters and further merges them back into words. HTS achieves
state-of-the-art results on multiple word-level text spotting benchmark
datasets as well as geometric layout analysis tasks.Comment: Accepted to WACV 202
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Scene text detection and document layout analysis have long been treated as
two separate tasks in different image domains. In this paper, we bring them
together and introduce the task of unified scene text detection and layout
analysis. The first hierarchical scene text dataset is introduced to enable
this novel research task. We also propose a novel method that is able to
simultaneously detect scene text and form text clusters in a unified way.
Comprehensive experiments show that our unified model achieves better
performance than multiple well-designed baseline methods. Additionally, this
model achieves state-of-the-art results on multiple scene text detection
datasets without the need of complex post-processing. Dataset and code:
https://github.com/google-research-datasets/hiertext.Comment: To appear at CVPR 202